32 research outputs found

    Upper-Confidence Bound for Channel Selection in LPWA Networks with Retransmissions

    Full text link
    In this paper, we propose and evaluate different learning strategies based on Multi-Arm Bandit (MAB) algorithms. They allow Internet of Things (IoT) devices to improve their access to the network and their autonomy, while taking into account the impact of encountered radio collisions. For that end, several heuristics employing Upper-Confident Bound (UCB) algorithms are examined, to explore the contextual information provided by the number of retransmissions. Our results show that approaches based on UCB obtain a significant improvement in terms of successful transmission probabilities. Furthermore, it also reveals that a pure UCB channel access is as efficient as more sophisticated learning strategies.Comment: The source code (MATLAB or Octave) used for the simula-tions and the figures is open-sourced under the MIT License, atBitbucket.org/scee\_ietr/ucb\_smart\_retran

    Decentralized Spectrum Learning for IoT Wireless Networks Collision Mitigation

    Full text link
    This paper describes the principles and implementation results of reinforcement learning algorithms on IoT devices for radio collision mitigation in ISM unlicensed bands. Learning is here used to improve both the IoT network capability to support a larger number of objects as well as the autonomy of IoT devices. We first illustrate the efficiency of the proposed approach in a proof-of-concept based on USRP software radio platforms operating on real radio signals. It shows how collisions with other RF signals present in the ISM band are diminished for a given IoT device. Then we describe the first implementation of learning algorithms on LoRa devices operating in a real LoRaWAN network, that we named IoTligent. The proposed solution adds neither processing overhead so that it can be ran in the IoT devices, nor network overhead so that no change is required to LoRaWAN. Real life experiments have been done in a realistic LoRa network and they show that IoTligent device battery life can be extended by a factor 2 in the scenarios we faced during our experiment

    ModÚles de Bandits Multi-Joueurs Revisités

    Get PDF
    International audienceMulti-player Multi-Armed Bandits (MAB) have been extensively studied in the literature, motivated by applications to Cognitive Radio systems. Driven by such applications as well, we motivate the introduction of several levels of feedback for multi-player MAB algorithms. Most existing work assume that sensing information is available to the algorithm. Under this assumption, we improve the state-of-the-art lower bound for the regret of any decentralized algorithms and introduce two algorithms, RandTopM and MCTopM, that are shown to empirically outperform existing algorithms. Moreover, we provide strong theoretical guarantees for these algorithms, including a notion of asymptotic optimality in terms of the number of selections of bad arms. We then introduce a promising heuristic, called Selfish, that can operate without sensing information, which is crucial for emerging applications to Internet of Things networks. We investigate the empirical performance of this algorithm and provide some first theoretical elements for the understanding of its behavior.Les bandits multi-joueurs multiarmes (MAB) ont fait l'objet d'études approfondies dans la littérature, motivés par des applications aux systÚmes de radio intelligente. De telles applications motivent l'introduction de plusieurs niveaux d'informations pour les algorithmes MAB multi-joueurs. La plupart des travaux récents supposent que l'algorithme dispose d'informations de détection (sensing). Dans cette hypothÚse, nous améliorons la meilleure borne inférieure connue pour le regret de tout algorithme décentralisé, et introduisons deux algorithmes, RandTopM et MCTopM, qui sont empiriquement meilleurs par rapport aux algorithmes existants. De plus, nous fournissons de solides garanties théoriques pour ces algorithmes, y compris une notion d'optimalité asymptotique en termes de nombre de sélections des mauvais bras. Nous introduisons ensuite une heuristique prometteuse, appelée Selfish, qui peut fonctionner sans utiliser le sensing, ce qui est crucial pour les applications émergentes aux réseaux de type Internet des Objets. Nous étudions les performances empiriques de cet algorithme et fournissons quelques premiers éléments théoriques pour la compréhension de son comportement

    Ce que peuvent et ne peuvent pas faire les astuces de doublement pour les bandits multi-bras

    Get PDF
    An online reinforcement learning algorithm is anytime if it does not need to know in advance the horizon T of the experiment. A well-known technique to obtain an anytime algorithm from any non-anytime algorithm is the "Doubling Trick". In the context of adversarial or stochastic multi-armed bandits, the performance of an algorithm is measured by its regret, and we study two families of sequences of growing horizons (geometric and exponential) to generalize previously known results that certain doubling tricks can be used to conserve certain regret bounds. In a broad setting, we prove that a geometric doubling trick can be used to conserve (minimax) bounds in RT=O(T)R_T = O(\sqrt{T}) but cannot conserve (distribution-dependent) bounds in RT=O(log⁥T)R_T = O(\log T). We give insights as to why exponential doubling tricks may be better, as they conserve bounds in RT=O(log⁥T)R_T = O(\log T), and are close to conserving bounds in RT=O(T)R_T = O(\sqrt{T}).Un algorithme en ligne d'apprentissage par renforcement est dit "Ă  tout moment" (anytime) s'il n'a pas besoin de connaĂźtre Ă  l'avance l'horizon T de l'expĂ©rience. Une technique bien connue pour obtenir un algorithme Ă  tout moment Ă  partir d'un algorithme qui ne l'est pas est "l'astuce de doublement" (Doubling Trick). Dans le contexte des bandits multi-bras adverses ou stochastiques, la performance d'un algorithme est mesurĂ©e par son regret, et nous Ă©tudions deux familles de sĂ©quences d'horizons croissants (gĂ©omĂ©trique et exponentielle), pour gĂ©nĂ©raliser des rĂ©sultats prĂ©cĂ©demment connus que certaines astuces de doublement peuvent ĂȘtre utilisĂ©es pour conserver certaines limites de regret. Dans un cadre trĂšs gĂ©nĂ©rique, nous prouvons qu'une astuce gĂ©omĂ©trique de doublement peut ĂȘtre utilisĂ©e pour conserver les bornes (minimax) en RT=O(T)R_T = O(\sqrt{T}) mais ne peut pas conserver les bornes (dĂ©pendantes de la distribution) en RT=O(log⁥T)R_T = O(\log T). Nous donnons un aperçu des raisons pour lesquelles les astuces de doublage exponentiel peuvent ĂȘtre meilleures, car elles conservent les bornes en RT=O(log⁥T)R_T = O(\log T), et sont proches de conserver les bornes en RT=O(TR_T = O(\sqrt{T})

    Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits

    Get PDF
    International audienceWe introduce GLR-klUCB, a novel algorithm for the piecewise iid non-stationary bandit problem with bounded rewards. This algorithm combines an efficient bandit algorithm, kl-UCB, with an efficient, parameter-free, changepoint detector, the Bernoulli Generalized Likelihood Ratio Test, for which we provide new theoretical guarantees of independent interest. Unlike previous non-stationary bandit algorithms using a change-point detector, GLR-klUCB does not need to be calibrated based on prior knowledge on the arms' means. We prove that this algorithm can attain a O(TA΄Tlog⁥(T))O(\sqrt{TA \Upsilon_T\log(T)}) regret in TT rounds on some ``easy'' instances, where A is the number of arms and ΄T\Upsilon_T the number of change-points, without prior knowledge of ΄T\Upsilon_T. In contrast with recently proposed algorithms that are agnostic to ΄T\Upsilon_T, we perform a numerical study showing that GLR-klUCB is also very efficient in practice, beyond easy instances

    Genome of Herbaspirillum seropedicae Strain SmR1, a Specialized Diazotrophic Endophyte of Tropical Grasses

    Get PDF
    The molecular mechanisms of plant recognition, colonization, and nutrient exchange between diazotrophic endophytes and plants are scarcely known. Herbaspirillum seropedicae is an endophytic bacterium capable of colonizing intercellular spaces of grasses such as rice and sugar cane. The genome of H. seropedicae strain SmR1 was sequenced and annotated by The Paraná State Genome Programme—GENOPAR. The genome is composed of a circular chromosome of 5,513,887 bp and contains a total of 4,804 genes. The genome sequence revealed that H. seropedicae is a highly versatile microorganism with capacity to metabolize a wide range of carbon and nitrogen sources and with possession of four distinct terminal oxidases. The genome contains a multitude of protein secretion systems, including type I, type II, type III, type V, and type VI secretion systems, and type IV pili, suggesting a high potential to interact with host plants. H. seropedicae is able to synthesize indole acetic acid as reflected by the four IAA biosynthetic pathways present. A gene coding for ACC deaminase, which may be involved in modulating the associated plant ethylene-signaling pathway, is also present. Genes for hemagglutinins/hemolysins/adhesins were found and may play a role in plant cell surface adhesion. These features may endow H. seropedicae with the ability to establish an endophytic life-style in a large number of plant species
    corecore